{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hackathon #5\n",
    "\n",
    "Written by Eleanor Quint\n",
    "\n",
    "Topics: \n",
    "- Transfer learning\n",
    "    - Changing the final layers of an existing network\n",
    "    - Stopping gradients and freezing layers\n",
    "- Batch Normalization\n",
    "\n",
    "This is all setup in a IPython notebook so you can run any code you want to experiment with. Feel free to edit any cell, or add cells to run your own code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# We'll start with our library imports...\n",
    "from __future__ import print_function\n",
    "\n",
    "import os  # to work with file paths\n",
    "\n",
    "import tensorflow as tf         # to specify and run computation graphs\n",
    "import numpy as np              # for numerical operations taking place outside of the TF graph\n",
    "import matplotlib.pyplot as plt # to draw plots\n",
    "\n",
    "cifar_dir = '/work/cse496dl/shared/hackathon/05/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# load CIFAR-10\n",
    "train_images = np.load(cifar_dir + 'cifar10_train_data.npy')\n",
    "train_images = np.reshape(train_images, [-1, 32, 32, 3]) # `-1` means \"everything not otherwise accounted for\"\n",
    "train_labels = np.load(cifar_dir + 'cifar10_train_labels.npy')\n",
    "\n",
    "test_images = np.load(cifar_dir + 'cifar10_test_data.npy')\n",
    "test_images = np.reshape(test_images, [-1, 32, 32, 3]) # `-1` means \"everything not otherwise accounted for\"\n",
    "test_labels = np.load(cifar_dir + 'cifar10_test_labels.npy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Transfer Learning\n",
    "\n",
    "Let's imagine you have an image classification task that's doable, but don't have a large enough dataset with which to train a network without overfitting. What to do? The answer is to use the convolutional stack of a powerful, existing network that you (or Google) has trained on a broad task like classifying [ImageNet](http://www.image-net.org/) ([Wikipedia](https://en.wikipedia.org/wiki/ImageNet)) and only train a few dense layers on top of it for classification. You get the benefit of good feature extraction without the danger of overfitting such a large network on your small dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# pull out classes 0 and 2 (airplanes and birds respectively)\n",
    "train_idxs = np.union1d(np.where(train_labels == 0), np.where(train_labels == 2))\n",
    "test_idxs = np.union1d(np.where(test_labels == 0), np.where(test_labels == 2))\n",
    "# the subsets we'll be working with\n",
    "# and handle an off by one error I haven't been able to diagnose\n",
    "subset_train_data = train_images[train_idxs][1:]\n",
    "subset_train_labels = train_labels[train_idxs][1:]\n",
    "subset_test_data = test_images[test_idxs][1:]\n",
    "subset_test_labels = test_labels[test_idxs][1:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load a model that's been trained on a similar classification task (the other part of CIFAR-10), and see how it performs. The first thing we have to do is find the tensors we need."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "session = tf.Session()\n",
    "saver = tf.train.import_meta_graph(cifar_dir + 'cifar10_network.meta')\n",
    "saver.restore(session, cifar_dir + 'cifar10_network')\n",
    "print(session.graph.get_operations())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a huge list of operations, but it looks like the operations of interest are `<tf.Operation 'input_placeholder' type=Placeholder>`, `<tf.Operation 'conv_net_2d_1/Relu' type=Relu>`, and `<tf.Operation 'mlp_1/linear_0/add' type=Add>`. The moral of this story is that you should always name your tensors consistently and document your networks clearly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "graph = session.graph\n",
    "x = graph.get_tensor_by_name('input_placeholder:0')\n",
    "conv_out = graph.get_tensor_by_name('conv_net_2d_1/Relu:0')\n",
    "mlp_out = graph.get_tensor_by_name('mlp_1/linear_0/add:0')\n",
    "\n",
    "y = tf.placeholder(tf.int32, shape=[None])\n",
    "conf_matrix_op = tf.confusion_matrix(y, tf.argmax(mlp_out, axis=1), num_classes=10)\n",
    "conf_matrix = session.run(conf_matrix_op, {x: subset_test_data, y: np.squeeze(subset_test_labels)})\n",
    "print(conf_matrix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This network doesn't classify any of our data correctly. So we're going to replace the classifying dense layer and train it on our data. We'll make sure that the weights in the convolutional stack don't change with [tf.stop_gradient](https://www.tensorflow.org/api_docs/python/tf/stop_gradient). This operation is a special version of the identity that copies data in the forward pass, but stops gradients in the backward pass. This allows the later parts of the network to update while preserving everything preceding, up to the inputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "conv_out_no_gradient = tf.reshape(tf.stop_gradient(conv_out), [-1, 16*16*32]) # flatten input from 16x16x32\n",
    "our_dense_layer = tf.layers.dense(conv_out_no_gradient, 2, name=\"our_dense_layer\")\n",
    "conf_matrix_op = tf.confusion_matrix(y, tf.argmax(our_dense_layer, axis=1), num_classes=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll also use [tf.get_collection](https://www.tensorflow.org/api_docs/python/tf/get_collection) to get the variables we've added for initialization. We want to make sure not to run the global variable initializer, because that would reset the variables from the values loaded from the saved model, so we'll use [tf.variables_initializer](https://www.tensorflow.org/api_docs/python/tf/variables_initializer) to only initialize the subset we've added. `tf.get_collection` uses [GraphKeys](https://www.tensorflow.org/api_docs/python/tf/GraphKeys) to get subsets of the graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# change data to be a binary classification probem between airplanes and birds (0 and 1)\n",
    "subset_train_labels[subset_train_labels > 0] = 1\n",
    "subset_test_labels[subset_test_labels > 0] = 1\n",
    "\n",
    "with tf.name_scope('optimizer') as scope:\n",
    "    # define new loss\n",
    "    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=our_dense_layer)\n",
    "    optimizer = tf.train.AdamOptimizer()\n",
    "    train_op = optimizer.minimize(cross_entropy)\n",
    "\n",
    "# initialize new variables\n",
    "optimizer_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, \"optimizer\")\n",
    "dense_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, \"our_dense_layer\")\n",
    "session.run(tf.variables_initializer(optimizer_vars + dense_vars, name='init'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we'll train for one epoch and and evaluate like we usually do to see the effect of our transfer learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# train for an epoch\n",
    "batch_size = 16\n",
    "for i in range(subset_train_data.shape[0] // batch_size):\n",
    "    batch_xs = subset_train_data[i*batch_size:(i+1)*batch_size, :]\n",
    "    batch_ys = np.squeeze(subset_train_labels[i*batch_size:(i+1)*batch_size])\n",
    "    session.run(train_op, {x: batch_xs, y: batch_ys})\n",
    "\n",
    "# and evaluate\n",
    "conf_matrix = session.run(conf_matrix_op, {x: subset_test_data, y: np.squeeze(subset_test_labels)})\n",
    "print('BINARY CONFUSION MATRIX:')\n",
    "print(conf_matrix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not bad for such a difficult classification problem and small dataset. Notice that we go from a 10 class problem to a 2 class one. The output can be totally different from the original, but transfer learning is most useful when in the same modality (RGB vs. greyscale or natural images vs. audio, etc.)\n",
    "\n",
    "Another option, instead of using tf.stop_gradient, is to pass an explicit list of trainable variables to [minimize](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer#minimize). Normally, the `var_list` argument is set to all the variables under `GraphKeys.TRAINABLE_VARIABLES`, but we can set it just to be the variables we've added."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# this only trains a few variables\n",
    "train_op = optimizer.minimize(cross_entropy, var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, \"our_dense_layer\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hackathon 5 Exercise\n",
    "\n",
    "Add an [l2 regularizer](https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) to the weight matrix of the old dense layer `<tf.Operation 'mlp/linear_0/w' type=VariableV2>`, then get a train_op with `AdamOptimizer.minimize` that trains that layer (`linear_0`), with the new regularizer on the cross entropy loss of its output with the correct label.\n",
    "\n",
    "Hints:\n",
    "\n",
    "1. Remember that 'mlp/linear_0/w' is the name of an Operation, not a Tensor\n",
    "2. We retrieved the output of the layer above as `mlp_out`\n",
    "3. No code needs to run, it just needs to be correct"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Batch Normalization\n",
    "\n",
    "We're going to briefly cover how to use batch norm; see Chapter 11 of textbook for more info on batch normalization. It's used to deal with internal covariate shift, or the tendency of the mean activation of a layer over the dataset to drift away from 0. Keeping this close to zero has an effect similar to normalization of the data in terms of making the learning problem easier and less sensitive to initial conditions. Generally normalization occurs just before the activation of a layer.\n",
    "\n",
    "TensorFlow implements batch norm with [tf.layers.batch_normalization](https://www.tensorflow.org/api_docs/python/tf/nn/batch_normalization). It only requires the input tensor, but may be customized with many, many optimal parameters. It's important to set whether the network is training or not with the `training` argument because, to quote the TensorFlow docs:\n",
    "\n",
    "> training: Either a Python boolean, or a TensorFlow boolean scalar tensor (e.g. a placeholder). Whether to return the output in training mode (normalized with statistics of the current batch) or in inference mode (normalized with moving statistics). NOTE: make sure to set this parameter correctly, or else your training/inference will not work properly.\n",
    "\n",
    "Batch normalization maintains internal statistics (its own parameters) on how to normalize between minibatches. It's important that these are updated as training proceeds, but equally important that they're not updated when the network layers aren't being updated as well.\n",
    "\n",
    "In order to ensure the parameters get updated in the correct order, dependencies should be explicitly signaled. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op. For example:\n",
    "\n",
    "```\n",
    "update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
    "with tf.control_dependencies(update_ops):\n",
    "    train_op = optimizer.minimize(loss)\n",
    "```\n",
    "\n",
    "The `axis` argument should be set if channels isn't the last dimension of your imagetensors.\n",
    "\n",
    "An example use in a convolutional block, also using the [tf.nn.elu](https://www.tensorflow.org/api_docs/python/tf/nn/elu) activation (closely related to ReLU):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "tf.reset_default_graph()\n",
    "def my_conv_block(inputs, filters, is_training):\n",
    "    \"\"\"\n",
    "    Args:\n",
    "        - inputs: 4D tensor of shape NHWC\n",
    "        - filters: iterable of ints\n",
    "    \"\"\"\n",
    "    with tf.name_scope('conv_block') as scope:\n",
    "        x = inputs\n",
    "        for i in range(len(filters)):\n",
    "            x = tf.layers.conv2d(x, filters[0], 3, 1, padding='same')\n",
    "            x = tf.layers.batch_normalization(x, training=is_training)\n",
    "            x = tf.nn.elu(x)\n",
    "    return x\n",
    "\n",
    "x = tf.placeholder(tf.float32, [None, 32, 32, 3])\n",
    "conv_output = my_conv_block(x, [16, 32, 64], True)\n",
    "flat_conv = tf.reshape(conv_output, [-1, 32*32*64])\n",
    "output = tf.layers.dense(flat_conv, 10)\n",
    "\n",
    "y = tf.placeholder(tf.int32, [None])\n",
    "cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=output)\n",
    "update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
    "with tf.control_dependencies(update_ops):\n",
    "    train_op = optimizer.minimize(cross_entropy)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "TensorFlow 1.12 (py36)",
   "language": "python",
   "name": "tensorflow-1.12-py36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
